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Many perspectives on the role of evolution in human health include nonempirical assumptions concerning the adaptive 
evolutionary origins of human diseases. Evolutionary analyses of the increasing wealth of clinical and population genomic 
data have begun to challenge these presumptions, in order to systematically evaluate such claims, the time has come to 
build a common framework for an empirical and intellectual unification of evolution and modern medicine. We review 
the emerging evidence and provide a supporting conceptual framework that establishes the classical neutral theory of 
molecular evolution (NTME) as the basis for evaluating disease- associated genomic variations in health and medicine. For 
over a decade, the NTME has already explained the origins and distribution of variants implicated in diseases and has 
illuminated the power of evolutionary thinking in genomic medicine. We suggest that a majority of disease variants in 
modern populations will have neutral evolutionary origins [previously neutral), with a relatively smaller fraction 
exhibiting adaptive evolutionary origins (previously adaptive). This pattern is expected to hold true for common as well as 
rare disease variants. Ultimately, a neutral evolutionary perspective will provide medicine with an informative and ac- 
tionable framework that enables objective clinical assessment beyond convenient tendencies to invoke past adaptive 
events in human history as a root cause of human disease. 



Since the announcements of Darwin's great discovery in the 
19th century, many scientists have shown an interest in the ap- 
plications of evolutionary principles into medicine (Nesse and 
Williams 1996; Stearns 1999; Trevathan et al. 1999; Kumar et al. 
2011). This interface between evolutionary biology and medi- 
cine became more broadly apparent after the establishment of 
DNA as the hereditary material in the 1950s, and subsequent 
advances in DNA sequencing technologies over the past two 
decades. Scores of investigations have highlighted the potential 
of applying evolutionary principles in human health and disease 
(McKenna et al. 1993; Miller and Kumar 2001; Ramensky et al. 
2002; Nesse et al. 2006; Kumar et al. 2011). These include recent 
evolutionary insights into the origins of specific, clinically rele- 
vant phenotypes, such as lactose tolerance and malaria resistance 
(Sabeti et al. 2006; Tishkoff et al. 2006; Williamson et al. 2007; 
Tung et al. 2009). The potential to apply evolutionary methods 
and principles toward clinical utility is gaining in significance 
with increasing public availability, and declining production 
costs, of sequence data measured from individuals and pop- 
ulations (Feero et al. 2010; Kumar et al. 2011). Consequently, 
efforts to incorporate evolutionary biology into medical educa- 
tion are now underway and expanding (Nesse et al. 2009; Stearns 
et al. 2010). 

As the interface between evolutionary biology and geno- 
mic medicine progresses into the mainstream of clinical research 
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and training, it is becoming important to establish conceptual 
frameworks for exploring this interface based on sound null hy- 
potheses. Such frameworks are a prerequisite for the scientific 
evaluation of claims concerning the role or utility of evolutionary 
biology in medicine (Miller and Kumar 2001; Gluckman et al. 
2009; Nesse et al. 2009; Kumar et al. 2011). Here we suggest, with 
strong support from the empirical data, that the classical neutral 
theory of molecular evolution (NTME) (Kimura 1983) provides 
a ready, validated framework for generating a null hypothesis 
required for the discovery, characterization, and clinical evalua- 
tion of human disease-associated variation. For the past 50 yr, the 
NTME has firmly established principles that explain the fate of 
the majority of new mutations in a population, and the emer- 
gence of differences within and among species (see below). Fur- 
ther, it has served as the fundamental basis for developing 
essential methods for identifying genomic tracts of adaptive 
change, predicting functionally important parts of the human 
genome, timing the origins of novel genes and mutations, and 
identifying genomic elements associated with human diseases 
(Nei and Kumar 2000; Felsenstein 2004; Yang 2006; Lynch 2007; 
Kumar et al. 2011). 

Here, we first introduce the primary tenets of the NTME 
as they apply to population and species differences, and then 
review the evidence that these tenets explain the nature and 
emergence of disease-associated variants in modern human 
populations. We also elaborate on the clinical implications of 
this neutral theory perspective, which integrates population 
and species evolution patterns for better diagnosis and treat- 
ment of human diseases with an etiological basis in genetic 
variation. 
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Neutral theory for personal, population, 
and species differences 

Under the NTME, the primary forces that give rise to and maintain 
variation within populations are mutation, recombination, and 
random genetic drift (for reviews, see Nei 1987; Lynch 2007; Nei 
et al. 2010). Mutations are de novo genomic alterations in an in- 
dividual which, when passed on to offspring through the germline 
(i.e., sperm and eggs), may contribute to disease and appear as 
disease-associated variants in contemporary populations. Muta- 
tions also occur in somatic cells and mitochondrial DNA, con- 
tributing to many monogenic and complex diseases (Taylor and 
Turnbull 2005; Erickson 2010). Recombination breaks up associa- 
tions between alleles that co-occur on the same chromosome, 
which can lead to or dissociate detrimental juxtapositions of allelic 
variants (Keightley and Otto 2006). 

Kimura (1983) and Nei et al. (2010) formalized and advocated 
the NTME, which asserts that random genetic drift dictates the fate 
of a newly arisen mutation over time in a population, and that 
a majority of (personal, population, or species) genetic differences 
observed today are selectively neutral. In a population with N 
randomly mating diploid individuals, a novel mutation will have 
a population frequency of 1/(2N). If the mutation is selectively 
neutral, then it will have a probability of 1I{2N) to become fixed in 
a population, and genetic drift will govern changes in its fre- 
quency. If a mutation is selectively deleterious, then the expected 
population frequency in subsequent generations is less than the 
current population frequency due to the effects of purifying se- 
lection. Therefore, de novo mutations and variations with signif- 
icant negative consequences will not rise to high frequencies, and 
they will often lead to diseases that affect fecundity depending on 
the age of their onset. Consequently, these variants will be ob- 
served in low frequencies within populations. Empirical support 
for these tenets is found abundantly in the observed patterns of 
genetic polymorphism in diverse population genomic data. For 
example, an analysis of dense population sequencing data reveals 
that the forces of purifying selection and neutral variation alone 
are able to explain overall genome-wide patterns of genetic varia- 
tion (e.g., Hernandez et al. 2011; Lohmueller et al. 2011). Func- 
tionally important regions of the genome show much lower de- 
grees of genetic variation than others with no known function 
(e.g., coding versus noncoding; Fig. lA; Goode et al. 2010; Mu et al. 
2011). Exonic variants that have become common in the pop- 
ulation (frequency >5%) are overabundantly synonymous (do not 
change the amino acid encoded). This is because the absolute 
numbers of such synonymous variants exceed nonsynonymous 
(amino acid altering) variants even though the number of posi- 
tions that can receive nonsynonymous mutations is three times 
greater than the positions that can receive the synonymous mu- 
tations (The 1000 Genomes Project Consortium 2010). Further- 
more, rare variants (frequency <1%) in exonic regions are enriched 
for deleterious, nonsynonymous variants (Fig. IB; Li et al. 2010; 
Marth et al 2011). 

Within the NTME framework, adaptive (beneficial) mutations 
have a higher probability of being retained in the population, 
depending on the selective advantage they confer (Eyre-Walker 
and Keightley 2009). However, adaptive variants are expected to 
occur less frequently in a population in comparison with neutral 
polymorphisms (Kimura 1983; Boyko et al. 2008; Eyre-Walker and 
Keightley 2009). This occurs despite the fact that any adaptive 
variant has a much higher chance of rising to high frequencies. 
We estimate that the proportion of high-frequency variants with 
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Figure 1. Observed patterns of population polymorphisms that are 
consistent with the neutral theory of molecular evolution. {A) Proportions 
of single nucleotide variants (SNVs) with a rejected substitution (RS) score 
greater than 3 in coding and noncoding regions (Goode et al. 2010). A 
lower degree of genetic variation is permitted by purifying selection in 
functionally important regions (coding) of the genome as compared with 
noncoding regions with no known function. (S) Proportions of non- 
synonymous SNVs predicted to be damaging by SIFT and PolyPhen-2 
programs (Marth et al. 201 1), which show an enrichment of deleterious 
(nonsynonymous) variants among the rare polymorphisms (allele fre- 
quency <1 %). 



neutral origins in a population will far exceed, by an order of 
magnitude, the proportion of high-frequency adaptive variants, 
even under assumptions that favor adaptive forces (Fig. 2). Con- 
sequently a vast majority of high-frequency variants observed in 
modern populations are more likely to be selectively neutral. 

In fact, even adaptive variants with significant fitness effects 
may follow a fate similar to that of selectively neutral mutations if 
the effective population size (N^) is small. This happens because the 
effectiveness of selection for a mutation is a product of fitness ef- 
fect, represented by the selection coefficient (s), and N^. Mutations 
with NgS « 1 act as "effectively" neutral. Therefore, in populations 
with smaller Ng, the effects of random genetic drift can more easily 
overcome the effects of selection (Welch et al. 2008; Hurst 2009). 
Estimates of for human populations (—3000-10,000) are orders 
of magnitude less than the size of the modern human mating 
population (Tenesa et al. 2007), which is likely due to the occur- 
rence of one or more population bottleneck events during early 
human migration (Voight et al. 2005; Lohmueller et al. 2008). This 
means that beneficial mutations need to have a rather high se- 
lective advantage in order to gain a significantly higher probability 
of rising to high frequencies due to their adaptive effects. Such facts 
underlie the original statement that "the essential part of the 
neutral theory is not so much that molecular mutants are selec- 
tively neutral in the strict sense as that their fate is largely de- 
termined by random drift" (Kimura 1983, p. 34). This also means 
that the original formulation of the neutral theory not only deals 
with the presence of deleterious mutations that are subject to 
negative selection, but also embraces adaptive and nearly neutral 
mutations (Ohta 1992; Hurst 2009), many of whose fate will be 
largely determined by random genetic drift (Nei et al. 2010). 

From population variation to species differences 

Under the NTME, population polymorphisms represent the tran- 
sient phase of evolution among species (Kimura 1983). The prob- 
ability that a strictly neutral variant will become fixed in a pop- 
ulation, and thus appear as a difference between species, is equal 
to the rate of mutation (Kimura 1968). The functional importance 
and characteristics of a locus dictates the strength of purifying 
selection against mutations occurring at that position. This 
means that a position will be more conserved between species 
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Figure 2. Relative occurrence of neutral and adaptive mutations that 
will reach high frequency (HF: 50%-90%) in a population, assuming that 
a fraction f of all new mutations is adaptive. An overwhelming majority of 
HF mutations in a population are expected to be neutral even when the 
adaptive mutations have a very high selective advantage (s, increment of 
fitness) relative to the wild-type allele. The fraction of HF adaptive muta- 
tions is not expected to exceed 3.5% even when 1 % of all new mutations 
in a population are adaptive, and the adaptive allele has a very high se- 
lective advantage. The expected proportion of HF neutral and adaptive 
polymorphisms was estimated by using Sawyer and HartI (1 992) equa- 
tions 11 and 12, which describe the equilibrium flux of mutant alleles 
reaching high frequency in an effective population size of 2N. Note that 
increasing the 2Ns will not increase the proportion of adaptive HF poly- 
morphisms, as the elevated probability of entering the high-frequency 
range is balanced by shortening of the transient polymorphic period. Also, 
we have used a rather liberal upper bound on the relative proportion of 
adaptive mutations (f = 1%; dashed line), but the actual fraction is 
expected to be smaller (f= 0.2%; solid line). For simplicity, only neutral 
and adaptive mutations were considered in the above calculations, be- 
cause deleterious mutations are unlikely to rise to very high frequencies 
(>50%) on their own or by hitchhiking with adaptive variation, because 
only a small portion of the genome is hitchhiking by adaptive substitutions 
(Chun and Fay 201 1 ). In any case, the maximum number of deleterious 
variants with high frequency will be similar to that of beneficial variants. 
Also, some low-frequency neutral variants will hitchhike to high fre- 
quency, which will cause a temporary increase of a high-frequency neutral 
derived allele during or immediately after the fixation of the adaptive 
mutation. This will not have any impact on the overall expectations, be- 
cause such increases are followed by a long period of below-average 
abundance of high-frequency variants (Kim and Stephan 2000). 



if substitutions between species occur slower than what would be 
expected under strict neutrality. This fact has been exploited in 
functional genomics extensively, where deviations from neutral 
evolutionary expectations serve as a major tool for identifying 
genomic regions of functional importance. If the interspecific di- 
vergence at a position is less than would be expected under neu- 
trality, it suggests that the site is likely functionally important, and 
that purifying selection has prevented alternative alleles which 
disrupt function from reaching fixation (Goode et al. 2010). 

The degree of purifying selection reflected in the interspecies 
evolutionary rate of substitution at a position shows a strong re- 
lationship with the population frequencies of alleles at poly- 
morphic loci (Fig. 3; Subramanian and Kumar 2006; Kumar et al. 
2009). Similarly, lO-fold more synonymous substitutions (per base 
pair) have occurred between human and chimpanzee proteins 
than nonsynonymous substitutions, a trend that is universally 
observed among closely and distantly related species (Clark et al. 
2003; Subramanian and Kumar 2006). Furthermore, the magni- 
tude of biochemical difference in amino acid variation within 
a population is very similar to that observed among amino acids 



substituted between species (Subramanian and Kumar 2006). And 
the degree of sequence divergence at fourfold-degenerate sites is 
similar to that of intronic regions and pseudo-genes, because their 
mutations are not generally expected to have any significant im- 
pact on function (Zheng et al. 2007). 

In the above, we have outlined neutral theory as it applies to 
the medically relevant genomic variation discussion below. A more 
detailed review of the biological and mathematical aspects of the 
neutral theory in the genomics era is found in Nei et al. (2010). 

Neutral theory's implications for medically 
relevant variants 

All population, personal, and disease-associated variants observed 
today have been subjected to, and their population frequencies 
a product of, the evolutionary processes of genetic drift and 
natural selection. Their unification under the NTME provides the 
theoretical basis upon which hypotheses about the distribution 
of these mutations — including simple, complex, mitochondrial, 
and somatic disease-associated mutations — can be developed and 
evaluated. 

Evolutionary distributions of disease-associated variants 

The NTME predicts that putatively causal disease variants in 
functional regions (e.g., protein-coding regions) would be ob- 
served disproportionately at slow evolving positions in the ge- 
nome, which are constrained by purifying selection against 
harmful mutations. Empirical studies using large samples of 
Mendelian disease-associated and neutral variation data support 
these expectations (Miller and Kumar 2001; Blekhman et al. 2008; 
Kumar et al. 2011). A vast majority of nonsynonymous single 
nucleotide variants (nSNVs) associated with Mendelian diseases 
occur at highly evolutionarily constrained positions, whereas 
nSNVs found in healthy individuals occur in more variable posi- 
tions (Fig. 4A). 
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Figure 3. Relationship between the cross-species evolutionary rate and 
minor allele frequency (MAF) for nSNVs. Slower-evolving positions host 
nSNVs with the lower MAFs, on average. MAFs for 1 5,043 neutral nSNVs 
from the HumVar data set were obtained for one HapMap population 
(CEU) (The International HapMap 3 Consortium 2010). Long-term evo- 
lutionary rates were estimated using the protein sequence alignment from 
the UCSC Genome Browser resource (Kumar et al. 2009; Fujita et al. 
2011). For each evolutionary rate category, the median MAF is plotted 
along with error bars (2 SEM). 



Genome Research 1 385 



www.genome.org 



Dudley et al. 





100% 


C 




0) 
3 


80% 


o- 




0) 

u. 


60% 


0) 




> 


40% 


re 




3 

E 


20% 


3 




o 


0% 



Mendelian Nonsyn. 

- Neutral 
• Disease 



> 100% 
u 

I 80% 

I 60% 

0) 

40% 

w 

I 20% 

0 0% 

> 100% 
u 

1 80% 
f 60% 

0) 

40% 

re 

I 20% -K' 

O 0% 



Mendelian Nonsense 

- Neutral 
• Disease 



Mitochondrial Nonsyn. 

- Neutral 
• Disease 



> 100% 

I 80% 
o- 

^ 60% 
o 

40% 

re 

I 20% 

0 0% 

>, 100% 
u 

1 80% 
I 60% 

0) 

40% 

_re 

I 20% 

O 0% 



Cancer Nonsyn. 

- Neutral 
• Disease 



Complex Nonsyn. 

- Neutral 
• Disease 



0 2 4 6 

Evolutionary Rate (subtitution/site/byrs) 

Figure 4. The cumulative frequency distribution of disease-associated 
variants (closed circles) and population polymorphisms (dashed lines) over 
long-term evolutionary substitution rates, with darker background indicating 
increasing evolutionary variability. (A) Nonsynonymous variants associated 
with Mendelian diseases; 39,578 variants from HGMD are shown (Stenson 
et al. 2009). (B) Nonsense variants associated with Mendelian diseases; 2806 
variants from HGMD are shown. (C) Nonsynonymous variants in human 
mitochondrion associated with diseases; 190 disease-associated and 964 
polymorphic variants from MITOMAP are shown (Ruiz-Pesini et al. 2007). (D) 
Nonsynonymous somatic mutations associated with cancers; 1207 driver 
variants from CanPredict are shown (Kaminker et al. 2007). (£) Nonsyn- 
onymous variants associated with complex diseases; 8644 variants from 
VARIMED are shown (Chen et al. 201 0). These distributions show that pro- 
tein positions harboring disease-associated variants tend to evolve at a much 
slower rate (lighter background) than positions harboring population poly- 
morphisms. This pattern is the most distinct for nonsynonymous variants 
associated with Mendelian diseases, cancers, or mitochondrial diseases. The 
distinction becomes less obvious for nonsense variants, and the pattern 
completely disappears for nonsynonymous variants associated with complex 
diseases, the latter of which are expected to exhibit a neutral evolutionary 
pattern due to their modest effects on fecundity. Evolutionary rates in panels 
A, B, D, and E are estimated using multiple alignments of orthologs from 46 
species (Fujita et al. 201 1 ) following the procedure in Kumar et al. (2009). For 
panel C, amino acid sequence alignments for mitochondrial proteins were 
obtained for 28 mammalian species from MamMiBase (Vasconcelos et al. 
2005) and the evolutionary rate was estimated following Kumar et al. (2009). 



Missense variants that introduce premature stop codons (i.e., 
nonsense variants) also occur with a greater preponderance at 
highly conserved positions when they are associated with diseases 
(Fig. 4B; see also Zia and Moses 2011). However, the attenuating 
effects of various cellular processes (e.g., nonsense mediated decay) 
may enable the persistence of nonsense mutations at relatively 
conserved positions (Frischmeyer and Dietz 1999; Holbrook et al. 
2004). Disease mutations caused by defects in the mitochondrial 
proteins also show a similar pattern (Fig. 4C; see also Bhardwaj 
et al. 2009; Montoya et al. 2009). Somatic cell mutations impli- 
cated as having a causal role in the pathogenesis of cancer (i.e., 
driver mutations) exhibit a trend similar to Mendelian disease 
variants (Fig. 4D; Kaminker et al. 2007; Forbes et al. 2008). This 
corresponds with the observation that driver mutations are pre- 
ponderant at protein functional sites (Dixit et al. 2009; Izarzugaza 
et al. 2009; Talavera et al. 2010), and that they occur in regions that 
have significant consequences for germline fitness in addition to 
the somatic fitness of clonally expanding neoplastic cell pop- 
ulations (Fischer et al. 2011). This is different from noncancerous 
complex diseases, where the genetic basis of each disease is at- 
tributable to many causal variants with very small fitness effects. In 
accordance with the expectations of NTME, nSNVs implicated in 
complex diseases with modest fitness effects exhibit evolutionary 
trends similar to neutral nSNVs (Fig. 4E; see also Thomas and 
Kejariwal 2004). 

Since evolutionary patterns are shaped by reproductive suc- 
cess, we would expect late-onset diseases, which manifest after 
reproductive age, to leave a weaker evolutionary imprint. There- 
fore, we would expect smaller differences between neutral variants 
and variants involved in late-onset diseases. This is indeed the case, 
as the amino acid variants involved in late-onset diseases show 
greater biochemical severities than those implicated in early-onset 
Mendelian diseases (Vitkup et al. 2003; Subramanian and Kumar 
2006). Additionally, mutations implicated in complex diseases, 
many of which manifest their most serious debilitating effects after 
reproductive age, occur at evolutionarily less conserved positions as 
compared with Mendelian disease mutations (Wright et al. 2003; 
Thomas and Kejariwal 2004; Kumar et al. 2011). 

Variants observed in personal exomes also follow the expec- 
tations of the NTME. Novel variants arising by de novo mutations 
in functionally important and, thus, slowly evolving positions in 
an exome are subjected to a greater degree of (purifying) selection. 
Consequently, the numbers of variants observed at evolutionarily 
constrained positions in a personal exome are much smaller than 
expected (observed/expected ratio <1), with an increasingly higher 
overabundance of nSNVs at faster evolving positions (Fig. 5). These 
footprints of neutral evolutionary processes in personal exomes 
ultimately shape the allele frequencies of nSNVs in populations 
(Fig. 3). Furthermore, it is expected that the balance between 
mutations and purifying selection across the human genome will 
result in rare minor alleles at the majority of polymorphic sites, 
which is supported by the analyses of whole genome and exome 
sequence data measured from diverse human populations (Gravel 
et al. 2011; Lohmueller et al. 2011). This would mean that the 
majority of disease-associated variations in the human population 
are likely due to low frequency and rare alleles, and these variants 
are expected to contribute disproportionately to functional pop- 
ulation variance under the NTME (Pritchard 2001; Pritchard and 
Cox 2002; Marth et al. 2011; Zhu et al. 2011; MacArthur et al. 
2012). Recent studies based on evolutionary modeling and pop- 
ulation sequencing confirm this expectation (Dickson et al. 2010; 
Eyre- Walker 2010; Gravel et al. 2011; Marth et al. 2011; Zhu et al. 
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Figure 5. The ratio of observed to expected (O/E) numbers of personal 
exome variants (nonsynonymous, nSNVs) found in positions with higher 
to lower degrees of evolutionary conservation (slow to fast evolutionary 
rates). There is an underabundance of nSNVs in slow evolving positions 
(O/E < 1; lighter background) and an overabundance at fast evolving 
positions (O/E > 1; darker background). A linear regression model fits the 
observed data well (R^ > 0.95), which denotes a decrease in the strength of 
natural selection with the increasing evolutionary rate, as expected under 
the neutral theory. Sequences of eight HapMap exomes (Ng et al. 2009) 
were used to generate the figure, based on evolutionary rates estimated 
using 46 species alignment of vertebrates and lamprey from the UCSC 
Genome Browser resource (Kumar et al. 2009; Fujita et al. 201 1). 



2011) and may explain some of the inability of current genetic 
association studies to detect many causal variants, as they are 
typically not sufficiently powered to detect associations with rare 
alleles at a majority of the polymorphic sites (Marth et al. 2011). 

Adaptive histories of disease variants 

Many specific cases of the interplay between evolution and human 
health have been focused on a few detrimental mutations that are 
found in appreciably higher frequencies in modern human pop- 
ulations (Table 1). Such high-frequency disease-associated variants 
can result from past adaptive events, which have become detrimental 
in the modern contexts. For example, a long-held adaptation- 
oriented view on the evolutionary origins of obesity draws from 
the "thrifty gene" hypothesis, which suggests that metabolic ad- 
aptation was driven by periods of food scarcity in hunter-gatherer 
populations (Neel 1962). Such views, and a few individual disease- 
associated variants with seemingly compelling adaptive explana- 
tions (Table 1), have elevated adaptation as a popular evolutionary 
modality for the genome-wide discovery of novel trait-associated 
loci in the human genome (Hindorff et al. 2009; Lopez Herraez 
et al. 2009; Grossman et al. 2010; Wu and Zhang 2010). 

However, such attempts to link many human diseases to 
adaptive evolutionary events face numerous problems. First, many 
signatures of genomic adaptation in humans are expected to be 
attributable to specific local environments, e.g., cattle domestica- 
tion and malaria epidemics (Tishkoff et al. 2006; Coop et al. 2009; 
Novembre and Di Rienzo 2009). In this case, local adaptations 
will not necessarily have significant implications for disease, as 
all subpopulations with different environments carry their own 
landscape of neutral alleles (Lupski et al. 2011), and the pathoge- 
nicity of local adaptations may not be realized until there is 
change in the local environment, or individuals become adversely 
"misplaced" through recent migration, e.g., sickle-cell anemia in 
the African-American population (Campbell and Tishkoff 2008). 



Also, when an adaptive variant is in the process of spreading in 
a given population, nonexpression of the adaptive trait is not 
necessarily deleterious, because the starting population with the 
former wild-type allele may not have been seriously unfit. Indeed, 
the thrifty hypothesis has been challenged in recent population 
genomic and epidemiological studies (Kimura et al. 2007; Prentice 
et al. 2008; Speakman 2008). However, it is plausible that extreme 
pressures on fitness, such as emergent infectious agents, could 
impose strong negative selection on wild-type alleles, and may 
expose deleterious cryptic variation through decanalization 
(Gibson 2009) or hitchhiking (Chun and Fay 2011). 

Selection against previously adaptive variation is only one 
possible evolutionary explanation for the existence of high- 
frequency variants associated with clinical phenotypes. Another 
explanation involves neutral mutations, many of which are ex- 
pected to rise to high frequency within a population due to ran- 
dom genetic drift. Because the negative selection (and thus neu- 
trality) of a variant depends on its genomic and environmental 
contexts, which can change over time and geography, many high- 
frequency variants with neutral origins (previously neutral) may 
also contribute to human diseases today. Population migration is 
well known to affect the neutrality of alleles through altering the 
landscape of population variation in subsequent migrated gener- 
ations. Repeated patterns of migration and settlement can lead 
to reduced genetic diversity in migrated populations due to serial 
founder effects (DeGiorgio et al. 2011). Migration during range 
expansion can also affect frequency distributions, with both neu- 
tral and deleterious variants having higher probabilities of fixa- 
tion on wave fronts of range expansion due to surfing effects 
(Klopf stein et al. 2006; Travis et al. 2007). Migration can lead to 
drastic changes in the environmental context, which can expose 
standing neutral variation to negative selection (Hancock et al. 
2010). Overall, population migration and genetic admixing dis- 
rupt the neutrality of an existing variation, subjecting it to nega- 
tive selection (Stajich and Hahn 2005; Nielsen et al. 2009; Wall 
et al. 2009). Therefore, any high-frequency variant observed to be 
associated with disease in modern contexts would be either pre- 
viously adaptive or previously neutral in origin. 

Using the population genetics aspects of the NTME, one can 
assess the relative contributions of previously neutral versus pre- 
viously adaptive mutations and establish a priori expectations of 
the degree to which the (positive) natural selection is contributing 
to disease today (Fullerton et al. 2000). Such a calculation suggests 
that the proportion of high-frequency variants with neutral origins 
will far exceed the proportion of high-frequency variants due to 
adaptation, as shown earlier (Fig. 2). This observation is corrobo- 
rated by recent population genomic studies, which have reported 
a relatively minor role for adaptation in the emergence of disease- 
associated variants (Boyko et al. 2008; Myles et al. 2008; Coop et al. 
2009; Hofer et al. 2009). An analysis of resequenced data for more 
than a hundred human genomes has shown that genomic di- 
versity around human lineage specific amino acid substitutions 
was no more pronounced than that around synonymous sub- 
stitutions (Hernandez et al. 2011), the latter of which are not 
expected to be subjected to positive selection to any significant 
degree. Also, the genomic diversity around functional regions does 
not appear to be substantially different than the genomic back- 
ground, nor are these regions enriched for alleles that segregate 
among populations (Hernandez et al. 2011). These observations 
suggest that classic selective sweeps for adaptive alleles may not be 
frequent in human evolution. At the same time, we note that the 
existing methods may be underpowered to detect the presence 
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Table 1. Some examples of genes harboring human disease mutations for which empirical molecular evidence supports a putative role 
for positive natural selection acting on emerged gene variants 



Biological Associated 
Gene Product process disease(s) Reference(s) 
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dehydrogenase 
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deficiency 




UDA UQQ 
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Alpha- and beta-globin 


Synthesis ot hemoglobin 


Malaria, sickle-cell anemia 
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LARGE 


Mucin-like 


Regulation of muscle fiber 


Lassa fever, muscular dystrophy 


Sabeti et al. 2007 




g lycosy Itra nsf erase 


activity 




LCT 


Lactase 


Lactose metabolism 


Lactose intolerance 


Tishkoff et al. 2006 


MAOA 


Monoamine oxidase A 


Neurotransmitter 


Mental retardation. Behavioral 


Gilad et al. 2002 








disorders 




MCPH1 


Microcephalin 1 


Putative roles in neurogenesis 


Primary microencephaly 


Evans et al. 2005; 






and regulation of 




Mekel-Bobrov et al. 2005 






brain size 






MMP3 


Matrix metallopeptidase 3 


Breakdown of extracellular 


Coronary artery disease 


Rockman et al. 2004 






matrix 






OPNILW 


Opsin 1 (cone pigments). 


Photoreceptor activity 


Color blindness 


Verrelliand Tishkoff 2004 




long-wave-sensitive 








PDYN 


Opioid neuropeptide 


Cognition and pain response 


Schizophrenia, cocaine 


Rockman et al. 2005 




precursor prodynorphin 




addictions, and epilepsy 




MEFV (pyrin) 


Mediterranean fever 


Modulator of innate immunity 


Familial Mediterranean fever 


Schaner et al. 2001 



of the limited degree of adaptive variation that is likely to exist, 
or different models of adaptive evolution in human history may 
remain to be discovered. 

Historical trends of the human lifespan suggest that selec- 
tion against previously neutral variants is likely to be frequent in 
modern human populations. Natural selection acts in accordance 
with reproductive success, which is linked to life spans of in- 
dividuals in populations. Many contemporary population alleles 
were once neutral and reached high frequencies when fitness was 
determined at a shorter average life span, e.g., —30 yr at birth 
(Caspari and Lee 2004). However, the average life spans of humans 
have grown significantly and steadily over the past centuries 
(Wilmoth 2000), which exposes nonneutrality of many variants 
only in later life stages (Wright et al. 2003). Thus, one would pre- 
dict an increase in the numbers of mildly deleterious (high and low 
frequency) variants segregating in the population, which is sup- 
ported by the recent analysis of standing genetic variation ob- 
served in whole-genome sequence data (Li et al. 2010; Zhu et al. 



2011) . Because previously neutral variants vastly outnumber adap- 
tive variations (Fig. 2), we expect to discover a growing num- 
ber of previously neutral variants involved in diseases (Pritchard 
2001; Kryukov et al. 2007). Directional selection acting on pre- 
viously neutral variants is difficult to discern from neutral var- 
iation using common statistical methods for detecting selec- 
tive sweeps (Teshima et al. 2006). Nonetheless, signs of selection 
on previously neutral variants are already being reported, e.g., 
Crohn's disease risk alleles in the N0D2 gene (Nakagome et al. 

2012) . Similar patterns of selection have been inferred for vari- 
ous traits associated with environmental adaptations (Hancock 
et al. 2010). Indeed, adaptive variation will underlie a small 
proportion of human genetic disease variation, many of which 
will be due to pressures from infectious disease or fitness trade- 
offs due to antagonistic pleiotropy (Finch 2010). However, these 
adaptive explanations must take the form of alternative hy- 
potheses that should be rigorously evaluated against a neutral 
null hypothesis. 
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Neutrality has already served as a null hypothesis in the 
development and application of advanced approaches for de- 
tecting signals of recent positive selection (Fu and Li 1993; Sabeti 
et al. 2002, 2006; Voight et al. 2006; Kimura et al. 2007; Sabeti 
et al. 2007; Williamson et al. 2007). Notable findings have 
emerged from the use of such approaches, including recent se- 
lection in the lactase (LCT) gene that allows some human pop- 
ulations to digest lactose into adulthood (Tishkoff et al. 2006), 
and adaptations associated with response to infectious disease 
(Ohashi et al. 2004; Corona et al. 2010). Furthermore, Chun and 
Fay (2011) reported that common disease-associated variants 
were more abundant than rare disease-associated variants in 
hitchhiking regions, although both types of variants are 
enriched in hitchhiking regions compared with nonhitchhiking 
regions. As we discover more adaptive substitutions leading to the 
evolutionary lineage of modern humans, as well as population- 
specific molecular adaptations, we expect that the adaptive or- 
igins of many more variants implicated in human diseases will be 
revealed (Pickrell et al. 2009; Corona et al. 2010; Grossman et al. 
2010; Casto and Feldman 2011). Of course, the adaptive variants 
may have a higher probability of becoming pathogenic in altered 
environments and genetic contexts as compared with previously 
neutral alleles. Nevertheless, we expect their absolute numbers will 
be overshadowed by the much greater preponderance of alleles 
having neutral origins. 



Neutral evolution as a tool to identify medically 
relevant variants 

Distinguishing harmful variations in personal genomes from 
those with benign effects during an individual's life course is a 
grand challenge in genomic medicine and personal genomics. Due 
to advances in sequencing technologies, medical practitioners 
are already being asked to diagnose mutations in a patient's ge- 
nome in addition to their physiological symptoms. Presently, 
the complete genomes and exomes of thousands of individual 
humans have been sequenced, with projects underway to expand 
these numbers to tens to hundreds of thousands (Levy et al. 2007; 
Wang et al. 2008; Wheeler et al. 2008; Pushkarev et al. 2009; The 
1000 Genomes Project Consortium 2010). Each personal genome 
carries thousands of protein variants and over a million noncoding 
DNA variants (Ng et al. 2009; Pushkarev et al. 2009; Cirulli and 
Goldstein 2010; Dewey et al. 201 1; Kumar et al. 201 1). The clinical 
assessment of a patient incorporating a personal genome has the 
promise of improving the overall appraisal of a patient's health 
risks and guiding clinical decision-making regarding lifestyle and 
therapeutic intervention (e.g., Ashley et al. 2010). Although high- 
throughput, genome-wide molecular profiling technologies such 
as RNA sequencing (RNA-seq) and regulatory profiling through 
chromatin immunoprecipitation (ChlP-seq) and related tech- 
niques provide powerful modalities for profiling the functional 
effects of individual variants (Kasowski et al. 2010), they are not yet 
amenable to regular clinical use, and still typically incorporate 
evolutionary profiling for data interpretation (MacArthur et al. 
2012; Ward and Kellis 2012). In fact, experimental or clinical evi- 
dence regarding the consequence of mutations even commonly 
observed in the population is often lacking, and recent, high- 
throughput efforts to identify common disease variants through 
genome-wide association studies (GWAS) have so far offered lim- 
ited clinical utility (Dupuis and O'Donnell 2007; McCarthy and 
Hirschhorn 2008; Manolio et al. 2009). 



Without any means to establish prior expectations of clinical 
significance on rare and common variants, their incidental dis- 
covery may cause the patient to be subjected to unnecessary 
clinical scrutiny (Kohane et al. 2006). Thus, there is an impending 
clinical need to be able to sort through variants in individual hu- 
man genomes, and highlight those that are most likely to have 
a consequence for an individual's health or course of clinical care. 
Fortunately, the expectations and evolutionary interpretations of 
mutational patterns asserted under the NTME are coherent across 
taxonomic levels (species, populations, and individuals). This 
provides a strong theoretical and empirical basis for establishing 
baseline expectations for the clinical assessment of population 
variation and personal mutations surveyed from contemporary 
personal genomes. Neutral expectations and the interpretation of 
evolutionary patterns across species can improve our ability to 
detect and diagnose the full spectrum of variants present in a per- 
sonal genome (Kumar et al. 2011). For example, single-nucleotide 
estimates of cross-species evolutionary constraint have been used 
to identify causal disease variants for Freeman-Sheldon syndrome 
and Miller syndrome from exome sequencing data (Cooper et al. 

2010) , and similar estimates of cross-species evolutionary rate 
were used to filter for functional rare variants in the clinical as- 
sessment of whole genomes for a family quartet (Dewey et al. 

2011) . 

More broadly, computational approaches have been de- 
veloped over the last decade to predict whether a variant impacts 
an individual, especially for protein variants. These tools in- 
corporate evolutionary conservation along with an increasingly 
large number of clinical or biological attributes into their al- 
gorithms (Karchin 2009; Cline and Karchin 2011). Although 
designed primarily for the analysis of coding mutations in can- 
didate genes for the purpose of hypothesis generation, these 
tools are now being utilized for genome-wide scans of coding 
polymorphisms (Doniger et al. 2008; Lohmueller et al. 2008). 
These tools appear to perform quite well for the purpose of 
generating biological hypotheses (Adzhubei et al. 2010; Gonzalez- 
Perez and Lopez-Bigas 2011). However, they frequently produce 
discordant diagnoses even when scanning for simple (mono- 
genic) diseases (e.g., Chun and Fay 2009), indicating a need to 
improve their accuracy for the purpose of clinical diagnosis. 
While most computational tools already incorporate evolution- 
ary information in their algorithms (Ng and Henikoff 2006), 
long-term evolutionary histories of species (that largely capture 
neutral evolutionary trends over millennia) can provide infor- 
mation to forecast when the computational predictions are ex- 
pected to be correct (Kumar et al. 2009, 2011). For example, 
the computational tools are found to be least accurate to correctly 
diagnose disease variation at the fastest-evolving positions 
(Kumar et al. 2009), which would be predicted to harbor mostly 
benign mutations under the NTME (Kimura 1983; Subramanian 
and Kumar 2006). 

Another emerging application of evolutionary information 
is their use as priors in population-based studies of human com- 
plex diseases having an etiological basis in human mutation. 
Evolutionary profiles of causal complex disease mutations are very 
similar to those of neutral population polymorphisms (Fig. 4E; 
Thomas and Kejariwal 2004; Blekhman et al. 2008; Cai et al. 2009), 
because complex disease variants typically have modest effects on 
fecundity, and also often have substantial nongenetic components 
through environmental influences. Therefore, it is frequently 
thought that evolutionary estimates of constraint and neutrality 
are not very informative for identifying complex disease variation. 
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Consequently; modern GWAS of complex disease traits operate 
under the statistical assumption that all measured loci are equally 
likely to harbor alleles that segregate between healthy and disease- 
affected populations. However, we know from the empirical data 
and established evolutionary principals that the expected pop- 
ulation allele frequency spectra for both ancient and derived alleles 
are tied to the evolutionary history of positions, and that certain 
categories of disease mutations have a propensity to be found at 
sites with particular evolutionary histories (Cooper et al. 2010; 
Kumar et al. 2011). 

Indeed, it is now clear that long-term evolutionary priors 
estimated from multispecies alignments can improve the chances 
of discovering reproducible variants in genetic disease association 
studies (Dudley et al. 2012). Because of the large difference in the 
depths of the human population coalescence tree (—100,000 yr) 
and the time to the most recent common ancestor with another 
species (5-7 million years; Kumar and Hedges 2011), long-term 
(i.e., across species) evolutionary quantities can be generally es- 
timated for each position in the human genome independent of 
human populations. Their use in identifying reproducible vari- 
ants that explain greater proportions of genetic heritability of 
common diseases helps to address the issue of "missing" herita- 
bility of complex diseases (Manolio et al. 2009; Dudley et al. 
2012). Therefore, it is likely that new types of association statistics 
that incorporate evolutionary priors would have increased the 
power to discover reliable disease-associated loci from both pub- 
lished and future genetic disease association studies (Dudley 
et al. 2012). 

At this point, it is important to highlight some other im- 
portant limitations in the use of evolutionary information as a 
tool for diagnosing mutations. For example, as mentioned earlier, 
a historically neutral allele may be rendered nonneutral in 
a modern population context due to changes in environmental 
exposures (previously neutral). Additionally, mutations that may 
have no selective impact on fecundity may exhibit real deleteri- 
ous effects subsequent to reproductive age, thereby escaping the 
filter of purifying selection across species. In such cases, long- 
term evolutionary trends are unlikely to inform on the disease 
propensity of variants. It is also possible that compensatory mu- 
tations could render previously deleterious variants neutral, 
causing them to revert to their deleterious nature under the con- 
text of a different genetic background (Kondrashov et al. 2002; 
Kern and Kondrashov 2004). This complicates neutral interpre- 
tations, as compensatory mutation events can be difficult to detect 
using existing methods. However, compensatory mutations would 
confound interpretation of any type of context-dependent selec- 
tion; thus, neutral and adaptive interpretations would be equally 
affected. 

Nevertheless, the useful application of molecular evolu- 
tion to medicine has advanced beyond the contemporary dis- 
cussions in evolutionary (and Darwinian) medicine of health 
and disease that often place a primary focus on human pop- 
ulation histories and historical environments in understanding 
root causes (Nesse and Williams 1996; Gluckman and Hanson 
2004; Stearns et al. 2010). Indeed, the ultimate achievement of 
a null hypothesis for human disease mutations founded in the 
NTME is to unify evolutionary biology, genomics, and modern 
evidence-based medicine. It is clear from the above discussion 
that evolution serves not as a pathological ghost of human 
history, but rather as a powerful framework for translating fun- 
damental discoveries into clinical utility toward improved pa- 
tient outcomes. 



A healthy perspective on evolution 
in molecular medicine 

It is important to address the role that popular perceptions might 
play in the acceptance of evolutionary thinking in medicine (Nesse 
and Williams 1996; Ewald 2000). Perhaps the most unfortunate 
consequence of a purely adaptation-oriented view of human dis- 
ease mutations is the belief that evolution serves as a wellspring for 
many facets of ill health. Some protagonists of this view have even 
been so bold as to declare which specific "failures" of evolutionary 
adaptation have caused specific diseases in modern populations 
(Boaz 2002). Such popular views appear to be inspired by the early 
adaptive explanations of human diseases (Allison 1954), where 
a common-variant-common-disease model was proposed to ex- 
plain genetic diseases (e.g., Pritchard and Cox 2002), based on 
an assumption that high-frequency variants arise purely due to 
adaptive advantages. Of course, changes in environment will 
expose some adaptive variants carried by many individuals to 
negative selection and, thus, disease. However, environmental 
changes will also disrupt the neutrality of many variants that 
have reached high frequency by random genetic drift, without 
adaptation. In a population, high-frequency neutral variants 
vastly outnumber adaptive variants, and the contribution of 
these neutral alleles to the common-variant-common-disease 
scenario has not received attention historically. In fact, an NTME 
based explanation does not require one to choose between rare- 
variant and common-variant hypotheses for common diseases, 
as their relative contributions will be a function of the timing of 
the environmental changes and the allele frequency landscapes of 
the contributing genes and genomic segments. Furthermore, se- 
lection will act on individual rare and common variants in the 
broader context of their joint functional effects across biological 
pathways (Gibson 2012). 

Overall, a neutral view suggests that historical adaptive 
evolutionary events are not the source of illness, but rather that 
evolution is the source of robustness and the reason that 
humans thrive so successfully under a broad range of conditions. 
It rejects the minority assertion that evolution has somehow 
"stopped" for humans because modern medicine and lifestyles 
have slowed adaptation (Belluz 2008; Stearns et al. 2010). A 
neutral perspective asserts, with overwhelming support from 
existing data, that the drumbeat of evolution continues in 
modern populations irrespective of differences in the fitness 
landscapes between modern and ancient human populations. 
In this way, evolution gains relevance on par with other contin- 
uous factors influencing individual health, such as behavior and 
environment. Finally, a crucial step toward integrating evolution 
and medicine rests in orienting empirical findings from evo- 
lutionary studies such that they are easily translated toward 
clinical application. 
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